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ABSTRACT 

Sociological studies on transnational migration are often 
based on surveys or interviews, an expensive and time- 
consuming approach. On the other hand, the pervasiveness 
of mobile phones and location-aware social networks has in¬ 
troduced new ways to understand human mobility patterns at 
a national or global scale. In this work, we leverage geo- 
located information obtained from Twitter as to understand 
transnational migration patterns between two border cities 
(San Diego, USA and Tijuana, Mexico). We obtained 10.9M 
geo-located tweets from December 2013 to January 2015. 
Our method infers human mobility by inspecting tweet sub¬ 
missions and user’s home locations. Our results depict a 
trans-national community structure that exhibits the forma¬ 
tion of a functional metropolitan area that physically tran¬ 
scends international borders. These results show the potential 
for re-analyzing sociology phenomena from a technology- 
based empirical perspective. 
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INTRODUCTION 

Historically, international borders have denoted a clash of 
cultures, races, economies and governments 0- Toward the 
end of the twentieth century, the emergence of globalization 
and planet-scale communications have reduced the function 
of borders from a position as trade barriers to deterrence of 
human migrations. This is especially true in the case of U.S. 
- Mexico border, the ninth longest frontier in the world. With 
over a million people crossing daily, this international border 
is considered the busiest in the world, as well as one of the 
most contrasting frontiers 136 50). 


The largest U.S. - Mexico border metropolitan region is the 
San Diego - Tijuana region. It is at the most southwestern 
point of the U.S. and the most northwestern point of Mexico. 
Currently, it has a combined population of over 4.0 million 
people, anticipated to grow to over 5.5 million by 2020 J45j 
|52) . San Diego and Tijuana present a unique relationship be¬ 
cause of their extreme income difference (with a ratio of 6.4 
to 1) and marked economical inequality (San Diego’s econ¬ 
omy being 11 times greater than Tijuana’s) |6j. The main 
source of interaction between these cities is transnational mi¬ 
gration (i.e., people who live in one country and work, on a 
daily basis, in the other), even though only half the popula¬ 
tion of Tijuana has legal rights to cross the border |T), and 
even less have legal right to work on the U.S. 00 - Nearly 
all of the immigrants live in Tijuana and work just across the 
border |6J. According to Alegrfa (TJ, 8% of San Diego’s to¬ 
tal workforce were immigrants of Tijuana; almost all male in 
their thirties with secondary education. Roughly all of the mi¬ 
gration goes through San Ysidro Land Port of Entry (LPOE), 
which opens 24 hours a day, seven days a week making it 
one of the most transited land-port in North America. It cur¬ 
rently processes approximately 50,000 vehicles and 26,000 
pedestrians per day, making it bottleneck in the system of in¬ 
terchange between the two countries, increasingly restricting 
the movement of passenger vehicles during peak times 1581. 


During the day, commuters crossing San Ysidro LPOE are ei¬ 
ther going to work or returning from it; late night weekends, a 
population of young adult northerners are returning from the 
bar and nightclub districts of Tijuana 


From a sociological perspective, San Diego - Tijuana ur¬ 


ban context has been explained as twin cities 134 59 
bi-national spaces @ |23| [57). Lawrence Herzog ]29] ~|30) 
has proposed the concept of trans-border metropolis, a func¬ 
tional metropolitan area that physically transcend interna¬ 
tional borders, and where urban management in such areas 
can only be achieved through a combination of city planning 
and international diplomacy. Moreover, the concept of trans- 
border appears to be generalizable to all border cities pairs 
| IJ. Nonetheless, these results typically rely on data obtained 
from surveys and small group observation, an approach that is 
generally both time-consuming and expensive. On the other 
hand, the pervasiveness of mobile phones and location-aware 
social networks (such as Twitter or Foursquare) has intro- 















duced new ways to understand human mobility patterns at 
a national or global scale |3][7 10 27 48) . 


In this work, we leverage geo-located information obtained 
from Twitter to provide an empirical basis in which to test for 
the existence of a sociological construct. Using geo-located 
information from Twitter had been previously explored in the 
CSCW community: in order to understand the restrictions in 
communications between different communities around the 
globe [ |23) , study how vulnerable users communicate during 
crisis ]35| |44) , or classifying Twitter users according to their 
information production and consumption [IS). However, to 
the best of our knowledge, no work has used geographical 
information to explain transnational migrations and transna¬ 
tional communities. 


Specifically, we focus on the following research questions: 

1. Within the context of geo-located social-networks, is there 
any evidence that Tijuana and San Diego act like a trans- 
border metropolis? 


2. How do the international borders affect the mobility of the 
U.S. — Mexico border metropolitan region? 

This work offers two main contributions: (i) proposing an 
improved approach (similar to Cranshaw’s |17)) to translate 
social-network information into a graph-theoretic framework, 
and (ii) providing an analysis, based on graph theory, that 
is capable of generating additional insights into the transna¬ 
tional migration and transnational community structure in the 
U.S. - Mexico border region. These method and results pro¬ 
vide a promising step for the field of Social Computing] 
specifically in examining online socio-behavioral phenom¬ 
ena. Our analysis and the conclusions drawn from it present 
decision-makers with a cost-effective and time saving way of 
sensing a transnational urban environment for which to in¬ 
form the design of public policy and international diplomacy. 

In the following sections we survey previous works on the 
context of San Diego - Tijuana region from a sociological 
perspective, and works regarding exploration of urban dy¬ 
namics using social network information. We also explain 
our method to transform geo-located social network data into 
a mathematical representation. Following, we present our 
graph analysis and community division for San Diego - Ti¬ 
juana region mobility, and a simple validation of our results 
based on official reports of the state of the San Diegan in¬ 
frastructure. Finally, we draw our conclusions and outline 
possible future work. 


RELATED WORK 


Social studies of the border 


The relation between border cities pairs has been thoroughly 
studied from a sociological perspective. For example, Herzog 


129 30) argues that demographic explosions on the border 


*As defined by Schuler 1541, Social Computing refers to “any type 
of computing application in which software serves as an intermedi¬ 
ary or a focus for a social relation”. 


have given rise to functional metropolitan areas that physi¬ 
cally transcend international borders. This new urban envi¬ 
ronment comes with a new set of problems not merely con¬ 
fined to a single nation, but across international boundaries. 
Furthermore, Anderson and O’Dowd 0 claim that borders 
are no longer peripheral regions in relation to national cen¬ 
ters, but are instead potential poles of economic growth. Bor¬ 
der regions thus gain some independence from their national 
capitals when it comes to policy, and become more likely to 
work with their neighbors across the border in developing 
economic, institutional, and public infrastructure. 

Viswanathan et al. f59) examine the patterns of social distri¬ 
bution using 16 key variables from U.S. and Mexican Census. 
Their results found out that San Diego’s urban planning, un¬ 
like most cities in the U.S., does not resolve around a central 
business district. Instead, it shows a high level of commonal¬ 
ity with Tijuana, due to the mixing and change factors caused 
by nearness to the border. 

Rubin-Kurtzman et al. [51,] study social and economical im¬ 
pacts of the southern California trans-border urban system. 
Their results show that the trans-border economic and tech¬ 
nological disparities, as well as the mobility of labor and cap¬ 
ital between southern California and Baja California affect 
population flows and the composition of the labor force in 
both sides of the border. Furthermore, the authors state that 
“[mjigration and trans-border mobility are the keys to de¬ 
mographic behavior in the region because migration is the 
principal determinant of population growth in Southern Cal¬ 
ifornia and Baja California region ”. 

Sparrow 0 discusses the territorial integration of San Diego 
and Tijuana. On the physical level, multiple roads and high¬ 
ways closely link these cities. Thus, from the air, this region 
can be considered as one continuous urban agglomeration. 
On a behavioral level, there is a significant amount of work, 
shopping, social and touristic integration of the populations. 
However, there is minimal integration on the communications 
level, only limited cultural exchanges, and there is little to 
no bi-national integration on a politico-administrative level. 
Sparrow also argues that San Diego and Tijuana are still a 
long way to go as to be considered bi-national cities. 

Social Networks as Urban Sensors 

On the topic of using social networks as urban sensors, sev¬ 
eral works have shown that it is possible to consider social 
media participants as virtual proxies for human behavior and 
mobility. For example, many works have used millions of 
check-ins posted in Foursquare, a social-network site that al¬ 
lows users to share their locations and places they visit with 
a group of friends. This geo-located data has been used to 
explain recurring patterns in human mobility |T6), to under¬ 
stand the underlying social aspects of mobility m to study 
the relation between distance and strength of social tie |53) , 
or to identify groups of people and the places they go ]32[. 

Cranshaw et al. ED propose a novel approach to visualize 
and investigate the dynamics, structure, and character of a 
city on a large scale. Their method clusters Foursquare data 
from the city of Pittsburgh, PA into Livehoods, an urban di- 













vision that considers both the type of place and the people 
living and working within. This result in dynamic urban divi¬ 
sions, that change with people’s behaviors, and are not depen¬ 
dent on politics or arbitrarily set divisions. Our work differs 
from Cranshaw’s in two ways. First, although Livehoods can 
be presented as an undirected weighted graph, their results 
do not rely on any graph analysis. In this work, we employ 
graph centrality measurements to further explore the relation¬ 
ships between neighborhoods. Secondly, whereas they study 
mobility from a place-centered perspective, analyzing venues 
and the users moving across them, we study the same phe¬ 
nomena from a user-centered perspective, by analyzing user 
movement across different places. However, we believe this 
difference only affects the possible interpretation of the re¬ 
sults and not the results themselves. 


Due to its popularity, and data access through a public API, 
Twitter has become one of the major sources in several works 
on human mobility and event detection. Some of the surveyed 
works have focused on the geo-tagged data representativeness 


and interaction with the underlying mobility dynamics 47 


331, while others seek to understand the relationships between 
geographical regions, neighborhoods and criminal behaviors 

(U- 


Hawelkaa et al. |27) attempt to validate the representative¬ 
ness of geo-located Twitter data as a global source for mo¬ 
bility data. Their work seeks to discover spatial patterns and 
clusters of regional mobility using a year of captured tweets 
(944 million geo-located messages). Each user is assigned 
to the country where he posted the most tweets, and is con¬ 
sidered mobile if he or she issued a tweet in another country 
within the year. The authors build a directional country-to- 
country network of human travels, which enabled them to 
quantify both the inflow and outflow of visitors. Following 
the work of Sobolevsky and Newman (56| [4l| , the network 
was split into communities or modules. The results show that 
travel connections between North and South America were 
stronger than those between America and Europe. Moreover, 
they were able to detect communities inside North America, 
South America, West and East Europe. 


Analyzing information from social networks has been a re¬ 
curring topic in CSCW. For example, in CSCW 2004, Goecks 
and Mynatt (25) presented Saori, a computational infrastruc¬ 
ture that leverages social networks to mediate information 
dissemination, allowing users to share semi-public informa¬ 
tion (such as work products or their geographical location) 
to a small group of people. Incidentally, their results show 
that social-network ties that extend across a border (e.g., or¬ 
ganizational borders or political borders) are similar to those 
that exist between acquaintances or colleagues, and people in 
shared interest groups. 


the fear of the unfamiliar are the strongest deterrence for suc¬ 
cessful collaborative work. 

In CSCW 2015, Kogan et al. (35) studied how retweeting ac¬ 
tivity, reposting or forwarding a message produced by another 
user, by geographically vulnerable users (e.g., those affected 
by hurricane Sandy in 2012) differed from the general Twit¬ 
ter population. Their work represents retweet activity as a 
directed graph, with reposterers as source nodes, the original 
poster as source, and directed edges between source to tar¬ 
get representing retweets. They analyze the graph structure 
by looking at network size and density, degree distributions, 
and PageRank centrality. Their results show that hubs tend 
to form more during the disaster than afterwards; that ge¬ 
ographical vulnerable users have denser interconnected net¬ 
works during disasters than before or after, and that they tend 
to re tweet information with more local utility than their non- 
vulnerable counterparts, who are more interested in the gen¬ 
eral picture. 

In this work, we extend on these results by analyzing mobility 
as registered in social network within the context and with the 
aim of understanding the phenomenon of transnational migra¬ 
tion. We also work upon a graph where nodes represent ge¬ 
ographical zones and edges aggregate the number of persons 
living in one zone and moving to another. However, since we 
do not study mobility at a global scale, our zones cannot be 
considered whole countries, and have to be defined in a more 
local sense. In order to do so, we leverage the home location 
inference cited in Bora, with the addition of a second clus¬ 
tering phase, as to obtain a data-driven division of the urban 
environments into neighborhoods of similar density. Thus, a 
user belongs to a neighborhood if his or her house is within 
the limits defined by that neighborhood. 

DATA AND METHODOLOGY 

We used a collection of 10,908,817 geo-located tweets. Each 
tweet had a unique identifier, date-time of submission, co¬ 
ordinates of submission (i.e., GPS latitude and longitude as 
reported by smartphones), and the content of the message. 
Tweets were captured using Twitter’s Streaming A Restarting 
from December 4th, 2013 until January 13, 2015. A bound¬ 
ing box (32° 25’ 4.2414” N, 117° 18’ 49.5066” W and 33° 
5’ 53.3178” N, 116° 49’ 17.9142” W) was used to filter only 
to those messages originating from San Diego and Tijuana. 
There are inherent limitations to our collection of data. The 
Twitter API provides access to approximately 1% of all the 
tweets (44). However, |39| shows that the data obtained from 
the API closely resembles a random sample drawn from the 
full Twitter stream. 

After discarding all information except the tweet coordinates 
and the user identifier, the procedure is as follows: 


In CSCW 2014, Garcia-Gavilanes et al. [23) showed that 
the international Twitter communication landscape was still 
largely dominated by geographical, economical and socio¬ 
cultural restrictions. Their work analyzes 13 million users 
spread over hundreds of countries. Their results show that 
language barrier, cultural factors dealing with intolerance and 


1. Infer each user’s home location. 

2. Cluster home locations into neighborhoods of similar den¬ 
sity. 

3. Create a graphical representation of the neighborhoods and 
users moving within. 


'https://dev.twitter.com/ 




Inference of a user’s home 

Previous work shows that it is possible to infer a user’s home 
location using their Twitter historical data. For example, 
Hecht et al. |28) performed an in-depth study of user behavior 
with regard of the user location field in Twitter. They found 
that although a huge percentage of users did not specify their 
location beyond a city level scale, it was possible to infer their 
state and city using machine-learning techniques. In another 
example. Bora et al. (TO) assume that users are more likely 
to tweet from their home at night. Their method starts by fil¬ 
tering tweets between 7:00pm and 4:00am, then it applies a 
single pass of DBSCAN clustering algorithm |20) . The cen¬ 
ter of the largest cluster is used as the exact coordinates for 
the user’s home. 

This work follows up on Bora’s assumption, with a mi¬ 
nor modification: filtering tweets captured from 10:00pm to 
4:00am. This new time range represents a better overlap be¬ 
tween the American and the Mexican working cycles, since 
the Mexican working cycle typically extends much later than 
their American counterpart |9). 

Obtaining a neighborhood division 

Just like Cranshaw’s Livehoods, we obtain a neighborhood 
division by applying an additional DBSCAN into our home 
location data. By using DBSCAN, the resulting divisions are 
roughly of the same population density. This approach to pro¬ 
ducing data-driven separations of urban environments goes 
beyond externally imposed boundaries, such as political bor¬ 
ough divisions or grid separations, which generally are based 
on census tracts and geographic landmarks ]T7| [37). Once 
again, we use the center of each cluster as the exact coordi¬ 
nates for each neighborhood’s location. 

Graphical approach 

A user is assigned to the neighborhood whose center is the 
closest to the user’s home location; a tweet is assigned to the 
closest neighborhood by comparing the geographic coordi¬ 
nates reported by Twitter to each of the neighborhood’s cen¬ 
ters. Regardless of their time of submission, all tweets were 
considered at this stage. All distances were calculated using 
the Haversine formula for great-circle distances (55). 

We construct a directed graph depicting user’s mobility. The 
nodes in this graph represent neighborhoods while the edges 
weight the number of people moving between vicinities. In 
other words, an edge from node x to node y has a weight w, if 
there are w persons whose home location lies within the limits 
of neighborhood x, and tweeted at least once in location y. 


Experimental Settings 

The present methodology introduces a way to obtain a graph 
representation from a collection of geo-tagged messages, 
such as tweets. However, this method greatly depends on the 
election of the DBSCAN parameters (i.e., e and minPts) both 
for the clustering of individual’s home locations and for the 
clustering of home locations into neighborhoods. 


Previous work uses a minPts parameter between 3 and 5 [20 
10 46). For the sake of considering as many users as possible. 


we decided on using the minimum bound for both clustering 
procedures. 

Our current approach does not select a single pair of e’s, in¬ 
stead it relies on a Monte Carlo simulation of a diversity of 
possible parameters. We produced five hundred pairs of e\ 
and ej with the following a priori distributions: 


ei ~ t/(0.3,1.0) 
62 ~ t/(0,0.15) 


where U(a, b ) denotes a continuous uniform distribution be¬ 
tween a and b. A uniform distribution was selected because 
it is the simplest distribution that conveys no previous knowl¬ 
edge of the underlying parameter distribution. However, cer¬ 
tain assumptions are made in order to simplify the results in¬ 
terpretation. 

The first assumption states that 6 i must be greater than 62 , 
based on the notion that the first cluster must coarsely capture 
the mobility of an individual across the span of both cities, 
while the second must finely distinguish between geographi¬ 
cally close neighborhoods. Thus, the distribution for ei was 
selected with a lower bound twice the upper bound of 62 - 

The second assumption states that since the distance between 
San Diego’s north-most position and Tijuana’s south-most 
position is less than 112 krr{^ all human mobility between 
these cities must be physically restricted to this measure¬ 
ment. Furthermore, considering an Earth’s arc length along 
the equator equal to 11 3krrJ^] then the maximum distance be¬ 
tween San Diego and Tijuana is roughly equal to 1°. The 
selected upper bound for the distribution of e\ reflects this 
assumption. 

Each ordered pair (ei, 62 ) results in a different way of assign¬ 
ing home locations to users and different neighborhood areas. 
Thus, for each user there are 500 possible home locations, 
and 500 ways of splitting the space into neighborhoods. Un¬ 
fortunately, this approach becomes computational prohibitive 
when considering that there would have been over two trillion 
(2 x 10 12 ) different associations. 

In order to reduce our problem space, we propose the use of 
an information-based distance metric as to select only those 
data splits that provide the most informative separations. 


Reducing problem space with an information-based dis¬ 
tance 

We draw a random sample of 109,088 tweets (1% of the 
dataset) and applied our two clustering procedures to obtain 
the 500 neighborhood divisions for this particular sample. 
Then, we compared all divisions pairwise using variance of 
information, an information-based distance metric. 


3 Measured using Google Maps public API: https: 
//developers.google.com/maps/ 

■'Obtained by dividing the Earth's circumference (40,075.017 km) 
by 360° 







Variance of information (VI) [38j measures the amount of 
information lost or gained in changing from clustering C to 
clustering C' (where a clustering is a set of clusters). The 
algorithm takes an information-based approach, by estab¬ 
lishing how much information is there in each clustering, 
and how much information one clustering gives about the 
other. Formally, let C and C' denote clusterings such that 
C = {Ci,C 2 , —,C K } and C' = {C[, C' 2 ,... ,C' K ). Also, let n k 
denote the size of cluster C k , and let n be the size of the whole 
dataset D. 

Assume that the probability of a point in dataset D to be in 
cluster Ck equals to 

m- 

P(k) = 

n 

Then a clustering’s information entropy can be obtained as 


K 

H(C) = - £ P(k) log P(k) (1) 

k= 1 


And the information C and C' share as 


K K' 


/(C,C') = 2Z TO ’ r)log 

A=1 it'=l 


P(k, k') 
P(k)P(k') 


( 2 ) 


where P(k,k') represents the joint probability distribution of 
the variables associated with the clusterings, that is, the prob¬ 
ability that a point belongs to C k in clustering C and to C' k in 
C'. Finally, VI can be defined using equation (|T]i and ([2]), as 


V/(C, C') = H(C) + H(C') - 2/(C, C') (3) 



Figure 1. Spatial embedding of the mobility graph for Tijuana and San 
Diego. 5,576 vertices formed the graph. For the sake of clarity, edges are 
omitted. 


successful. Nodes in the Tijuana area were scarce and fairly 
spread out, making concentrations of no more than 3 or 4 
points at the time (see Figure[3]>. 


This process outputs a distance distribution over the informa¬ 
tion contained in each division. For this work, only the top 
5% most informative comparisons were considered. Between 
them, the top 5 percentile clusterings divided the Tijuana-San 
Diego region into 5,628 areas. Finally, our approach consid¬ 
ers all this areas as part of a directed graph. Just as mentioned 
earlier, this areas correspond to graph nodes and the edges 
sum up the number of people moving between them. 

RESULTS 

The mobility between San Diego and Tijuana can be rep¬ 
resented as a directed graph formed by 5,628 nodes with 
212,572 connections between them. Thus, the graph was not 
completely connected. Instead it was formed by 53 different 
pieces (components). All nodes not connected to any other 
node were eliminated, leaving the final graph with 5,576 ver¬ 
tices and 211,796 edges. Figure [T] shows the spatial embed¬ 
ding of the mobility graph. 

Concentration of nodes in certain areas allowed for the iden¬ 
tification of some important San Diegan locations. For exam¬ 
ple, University of California at San Diego (UCSD), the city’s 
downtown. Valley View Casino Center (formerly San Diego 
Sports Arena), and Mission Beach. Figure [2] highlights the 
main areas. On the Mexican side, this analysis was not as 


Graph Analysis 

An analysis of node degree and betweenness centrality allows 
for the determination of key points for understanding trans- 
border mobility. Moreover, the fact that the graph was not 
completely connected permits an analysis of the graph’s mod¬ 
ularity, a measure of how many communities can the graph 
be split into, and hence a study of the transnational migration 
patterns. 

Node degree 

Weighted in-degree (i.e., the number of people arriving at 
each node) and weighted out-degree (i.e., the number of peo¬ 
ple leaving each node) were obtained. Weighted In-degree 
had a distribution with minimum 1, maximum 58,195, and 
median 115. Out-Degree ranged from 1 to 62433 with a me¬ 
dian of 112. There was no statistical difference between the 
two distributions (KS-test KS = 0.0204, p > 0.05). The node 
with the highest degree in-bound was located in the north- 
most part of Coronado peninsula, corresponding to Naval Air 
Station North Island, and the node with highest degree out¬ 
bound was 32° 41’ 7.8” N 117° 02’ 22.0” W, which does not 
directly point to any landmark. We hypothesize that such a 
high out-degree was due to the proximity of California State 
Road 54 (SR 54) which connects Interstate 5 (1-5) and El Ca¬ 
jon, California. 



Community 0 
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Community 2 


Community 3 
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Figure 4. Communities of Tijuana and San Diego. Mexican side of the border is deeply connected to the San Diegan side; however, there are communities 
in San Diego whose movements are completely restricted to the U.S. territory. 



Figure 2. Concentration of nodes reveals important areas of San Diego: 
(A) Mission Beach, a popular recreational area and home to attractions 
such as Sea World; (B) Valley View Casino Center, the city’s main sports 
arena; (C) Downtown San Diego, and (D) UCSD. 



Figure 3. In the case of Tijuana, nodes were scarce and pretty evenly 
spaced, making the visual recognition of clusters difficult. 


Communities and modularity 

A community structure is the appearance of densely con¬ 
nected groups of vertices that have only sparse connection 
with other groups ED- Modularity is the measurement of the 
division of a network into communities or modules. Com¬ 
monly denoted by Q, it presents a real value between - ^ (in¬ 
clusive) and 1 (exclusive) where higher values indicate the 
presence of community structure within the network. Using 
the Louvain method (8), seven distinct communities were de¬ 
tected, with a value of Q — 0.707. 

Figure 4 shows the communities in relation to the border. 
Three out of the seven communities (community 3, 4 and 

5) depict a constant mobility across U.S. - Mexico frontier. 
Two other communities (community 0 and 1) have only a 
few nodes across the border. The two communities (2 and 

6 ) contained all its mobility within the U.S. side of the bor¬ 
der. No community was restricted to the Mexican side. These 
results hint towards three types of transnational migration: (i) 
a flow that continuously crosses the border, from Mexico up 
to Chula Vista, California and vice versa; (ii) a group of peo¬ 
ple who live in San Diego City and has no need to cross the 
border at all, and (iii) a group of people commuting from Ti¬ 
juana all the way up to La Jolla, California. Communities that 
do not cross the border have a much more confided mobility 
than their international counterparts. Community 0 captures 
mobility from and to San Diego City’s downtown, as well 
as movement from and to UCSD. San Ysidro land point of 
entry (LPOE), Tijuana’s main border crossing, is also rep¬ 
resented inside community 0 as one of the points over the 























Figure 5. Community 4 spans movement from Tijuana and National 
City, California. The sub-graph here presented was fully connected, al¬ 
though the edges are omitted again for clarity. 


border. Community 2 depicts movement from El Cajon, CA 
and northwest areas towards La Jolla, CA. Communities 3 
and 4 represent the mobility in Tijuana all the way north up 
to National City, CA. Community 5 movement goes all the 
way up to La Jolla, CA, where the limits of this study were 
set. However, it is quite possible that human journeys from 
this community would had reached Los Angeles area, hadn't 
the limit existed. Ligure 5 and Ligure 6 illustrate three of the 
communities that go across the border in a greater detail. 


Node Centrality 

A node’s centrality is a measure of the importance of this par¬ 
ticular node within the network (40) . Centrality measures al¬ 
low for key element identification, especially important in bi¬ 
ology (e.g., to understand main decease spreaders) or social 
network analysis (e.g., to identify influential people). Many 
of these centralities are based on shortest paths linking pairs 
of nodes 1111. One of such centralities is Betweenness Cen¬ 


trality (BC) (5.. 211. which measures the ability of an individ¬ 
ual node to control the communication flow in the network 
|2j. BC is normally calculated as the fraction of shortest 
paths between node pairs that pass through the node of in¬ 
terest (42) . Extensions to the BC definition made it applica¬ 
ble to weighted graphs 112 60). Lurthermore, several works 
have shown a positive correlation between traffic congestion 
in a transportation network and its corresponding node’s BC 
measurement I 


Every node’s BC was calculated. The obtained measurements 
lay between 0.0 and 2,708,509.9, with a median of 2,262.7. 
This distribution was highly skewed to the left (i.e., only a few 
nodes had huge values while the majority had really low val¬ 
ues). The Montogmery Eield Airport found the highest value 
just short of the intersection between Interstate 805, the major 
north-south highway in Southern California, and SR 163. The 
second highest BC was found closely to San Ysidro LPOE. 
The third most important BC was found in 32° 4T 7.8” N 
117° 02’ 22.0” W (the same node that scored the highest out- 



Figure 6. Community 5 that spans all the way up to La Jolla. Again, the 
sub-graph was fully connected. Edges are omitted for clarity. 


degree earlier on) between SR 54, Woodman Street and Bri- 
arwood road. Once again we believe this result was due to the 
importance of SR 54 in connecting 1-5 and El Cajon, CA. On 
fourth place, there is a node on the Navy’s Lodge in Coronaro 
peninsula. Finally, the fifth node in importance is located in 
Terra Nova Chula Vista, an apartment complex close to 1-805. 
We believe this is because this node is intended to capture 
journeys traveling on the 1-805. 

Validation of the results 

Just like Hawelkawa et al. (27) mention, it is difficult to find a 
bias-free human mobility dataset that would enable direct val¬ 
idation of results obtained with Twitter. A remaining possi¬ 
bility would be to use existing traffic services such as Google 
Maps or Waze. The latter even has its own public API for 
tra ffitfl. However, obtaining information from these services 
would require a development that goes beyond the scope of 
this work. Instead, we draw a comparison of our results and 
those presented by official government agencies reporting on 
the status of San Diego’s infrastructure. 

The most central node was found on SR 163, close to the 
intersection of 1-805. Interstate 805 (1-805) is a major 
north/south freeway whose primary purpose is to provide an 
alternative route for 1-5 traffic in order to bypass the con¬ 
gested Central Business District (CBD) (14). It also serves as 
a commuter route providing direct access to employment cen¬ 
ters in Otay Mesa, Kearny Mesa and Sorrento Valley. Along 
with 1-5, 1-805 is an important corridor for the movement of 
people and goods from Baja California and the U.S. - Mex¬ 
ico border region to the northern destinations The exits 
from 1-805 towards SR-163, as well as the entry from SR-163 
towards 1-805, were found to be operating at a highly defi¬ 
cient level of service [f4|. In 2008, it served almost 200,000 

https://www.waze.com/about/dev 








commuters on average every weekday. It is estimated to have 
over 250,000 projected average weekday daily traffic by 2030 


mm- 


The second most important node was just short of the San 
Ysidro border crossing. San Ysidro border crossing is one of 
the busiest land border crossings in the world. Open 24 hours; 
7 days a week it handles a daily traffic of 50,000 vehicles and 
26,000 pedestrians per day |T9 58). A 2005 study from the 
San Diego Association of Governments, in cooperation with 
the California Department of Transportation, found that San 
Diego lost over $1.3 billion in potential revenues; 3 million 
working hours; and 28,000 to 35,000 jobs because of exces¬ 
sive border waits (52) . A recent expansion project from the 
U.S. plummeted San Ysidro border crossing wait times to just 
minutes 


According to BC, the third most important node was found 
between SR 54, and Woordman St. SR 54 is a major east- 
west facility serving intraregional traffic, providing access to 
the communities in the South Bay, Spring Valley, Rancho San 
Diego, and the cities of Chula Vista, Nacional City and El 
Cajon. SR-54 provides an alternative route to 1-805, SR-94 
and 1-8. Travelers to Mexico can reach 1-5, 1-805, SR-194 
and SR-125 by way of SR-54. In 2009, SR-54 between 1-5 
and 1-805, and between 1-805 to Brianwoord scored a D-rank 
level of service, with close to 126,000 and 118,000 weekday 
average number of commuters. By 2030, this road is expected 
to serve almost 150,000 people on an average weekday 03 - 


DISCUSSION 

We obtained 10.9 million tweets from San Diego - Tijuana 
border region. The proposed method infers the home loca¬ 
tion of the users and their mobility through the region. By 
applying again a clustering procedure to the home locations, 
we were able to divide the urban space into neighborhoods 
of similar density. This neighborhoods are data-driven, thus 
free from constrains of current political divisions (e.g., they 
do not have to follow streets or landmarks). We represent 
the region’s mobility as a directed graph by using neighbor¬ 
hoods as nodes, and assigning weights to the edges according 
to the number of people living in one location and traveling 
to another. We shall now discuss the main implications of our 
findings, main limitations, and possible future works. 


Implications 

Within the context of geo-located social-networks, is there 
any evidence that Tijuana and San Diego act like a trans- 
border metropolis? We obtained the community structure of 
San Diego County and Tijuana by using a community detec¬ 
tion algorithm on the mobility graph. We found seven com¬ 
munities that explain the region’s human movement. People 
who live on one side of the border and constantly cross over 
to the other side formed five of these. We believe most of this 
movement is due to transnational migration, since no commu¬ 
nity restricted its movement to the Mexican side. On the other 
hand, some of the groups moved only on the San Diego City 
region, in a pattern that seemed specifically to people living 
and commuting close to the city’s business district. These re¬ 
sults support the idea of a trans-border region closely linked 


by infrastructure and by daily commuters crossing the bor¬ 
der for their economical and leisure activities. Our results 
support Sparrow’s and Alegrfa’s 023 claims that, on a 
infrastructural and behavioral level, these cities are closely 
linked. However, at the current time, we cannot support nor 
disclaim Sparrow’s asseveration on the lack of cultural and 
politico-administrative integration of the region (57). How¬ 
ever, Garcla-Gavilanes |23) already showed that trans-border 
communication in Twitter is highly limited by language bar¬ 
riers and differences in cultural factors, both of which are 
present in San Diego - Tijuana region. 

How do the international borders affect the mobility of 
the U.S. - Mexico border metropolitan region? Mobility 
between San Diego and Tijuana is largely dictated by their 
land ports of entry (LPOE), specially regarding the main bor¬ 
der crossing in San Ysidro. Previous works have already 
shown the importance of San Ysidro LPOE in understand¬ 
ing economics, migration and mobility of labor in the region. 
Even the U.S. General Services Administration has noted that 
San Ysidro LPOE has become a bottleneck in the system of 
interchange between the two countries, increasingly restrict¬ 
ing the movement of passenger vehicles during peak times 
[581. Our centrality measurements confirmed this pattern. 
The graph’s node corresponding to San Ysidro LPOE was 
ranked second in importance, just behind the SR 163 and I- 
805 intersection. 

Limitations and Extensions 

The conclusions of this paper are limited in scope by a sample 
that is not representative of the general population, only of 
Twitter users with geo-tagging enabled. Previous works have 
noted that geo-located tweets account for around 1% of the 
total of messages submitted to Twitter 127. [39). However, 
future work might circumvent this limitation by feeding from 
distinct geo-located public data sources. This would result 
in a better understanding of the different behaviors of users 
in different modes of transportation, improve on the possible 
routing of human transit, and reduce the sample bias suffered 
from using only a Twitter sample. 

Moreover, the geographical restrictions of our study were 
limited to Southern California and Tijuana. Additional ex¬ 
plorations of our results hint towards mobility flow heading 
for Calexico, California and its sister city Mexicali, Baja Cal¬ 
ifornia. Future work might be able to find that some commu¬ 
nities’ mobility extend beyond the San Diego metropolitan 
area. 

CONCLUSIONS 

In this work, we have studied geo-located social-network in¬ 
formation obtained from Twitter to provide an empirical ba¬ 
sis to further understand the trans-border metropolis and their 
transnational migration dynamics. Our aim was to understand 
the daily commutes of people living in the San Diego - Ti¬ 
juana transnational metropolitan region. We have employed a 
methodology capable of translating social-network informa¬ 
tion into graphical representation. This graph and the analysis 
that can be performed on it have considerable potential value 
to policy makers and urban planners. Our methodology and 




results intent to provide urban-planners and decision-makers 
with a time-saving, easy-to-deploy, and cost-effective method 
of sensing an urban environment as to assert population’s mo¬ 
bility and inform the design of public policy, particularly with 
respect to transportation and immigration topics. This is es¬ 
pecially important when considering that urban planning in 
a future trans-border megalopolis will require knowledge of 
multiple (and highly distinct) cultural, social and economical 
factors. 
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